Created on 5 October 2023.
This is a tutorial on loading, preprocessing and visualising BedMachine data with Python. I hope it can also be of use for loading similar .nc (netCDF) polar climate data sets (see my GitHub repository for other Antarctica data sets) or more generally, spatial data.
Different packages and functions were developed for datatypes like images, matrices, tensors, or gridded spatial data, who all have special conventions. While we can utilise many such packages and functions for our data set, particularly the following two conventions change across them:
Firstly, we load all packages.
import xarray as xr
import plotly.graph_objects as go
import plotly.express as px
import plotly.io as pio
pio.renderers.default = 'notebook'
import torch
import pyproj
import numpy as np
import cartopy.crs as ccrs # Projections list
import cartopy.feature as cfeature # for coastlines
import matplotlib
import matplotlib.pyplot as plt
import matplotlib.patches as mpatches # draw domain boundry patches
# to extract versions of packages used
import watermark
Data: MEaSUREs BedMachine Antarctica, Version 3
User guide: Link to pdf
Download the data from the US National Snow and Ice Data Center (NSIDC) to your machine. We then load it into an xarray object.
# Load with xarray, change path
xr_bedmachine = xr.open_dataset("/Users/kimbente/BedMachineAntarctica-v3.nc")
# Look at data
xr_bedmachine
<xarray.Dataset>
Dimensions: (x: 13333, y: 13333)
Coordinates:
* x (x) int32 -3333000 -3332500 -3332000 ... 3332000 3332500 3333000
* y (y) int32 3333000 3332500 3332000 ... -3332000 -3332500 -3333000
Data variables:
mapping |S1 ...
mask (y, x) int8 ...
firn (y, x) float32 ...
surface (y, x) float32 ...
thickness (y, x) float32 ...
bed (y, x) float32 ...
errbed (y, x) float32 ...
source (y, x) int8 ...
dataid (y, x) int8 ...
geoid (y, x) int16 ...
Attributes: (12/17)
Conventions: CF-1.7
Title: BedMachine Antarctica
Author: Mathieu Morlighem
version: 03-Jun-2022 (v3.4)
nx: 13333.0
ny: 13333.0
... ...
ymax: 3333000
spacing: 500
no_data: -9999.0
license: No restrictions on access or use
Data_citation: Morlighem M. et al., (2019), Deep glacial tr...
Notes: Data processed at the Department of Earth Sy...Observations:
Let's learn how to extract data from the xarray object:
# This is how you extract the 1D array of all y coordinate values:
xr_bedmachine.coords["y"].values
print("Shape of y coordinate array:", xr_bedmachine.coords["y"].values.shape)
# This is how you extract the 2D array of all bed elevation values:
xr_bedmachine["bed"].values
print("Shape of bed elevation array:", xr_bedmachine["bed"].values.shape)
Shape of y coordinate array: (13333,) Shape of bed elevation array: (13333, 13333)
We now only want to select a 100 x 100 scene that we will visualise in the following. We use .sel to subset the data set by (Polar Stereographic) coordinate values.
# Define H/W size of scene in meters (unit of the coordinate system.)
# While the scene is 100 pixel wide x 500 meter resolution = 50,000 m = 50.0 km wide and high,
# the grid cell centroids on the outer boundries of the scene will only be
# 49500 m apart (250 meters inwards from either side)
scene_size = 49500
# Select y-coordinate (lower edge)
scene_y_min = - 601500
# (upper edge)
scene_y_max = scene_y_min + scene_size
print("(y min coordinate, y max coordinate): ", scene_y_min, ",", scene_y_max)
# Select x-coordinate (left edge)
scene_x_min = 1034000
# (right edge)
scene_x_max = scene_x_min + scene_size
print("(x min coordinate, x max coordinate): ", scene_x_min, ",", scene_x_max)
# !! y slicing is ordered y_max, y_min, cooresponding to high to low ordering of coordinate index.
xr_bedmachine_scene = xr_bedmachine.sel(y = slice(scene_y_max, scene_y_min),
x = slice(scene_x_min, scene_x_max))
# Look at subsetted data set
xr_bedmachine_scene
(y min coordinate, y max coordinate): -601500 , -552000 (x min coordinate, x max coordinate): 1034000 , 1083500
<xarray.Dataset>
Dimensions: (x: 100, y: 100)
Coordinates:
* x (x) int32 1034000 1034500 1035000 ... 1082500 1083000 1083500
* y (y) int32 -552000 -552500 -553000 ... -600500 -601000 -601500
Data variables:
mapping |S1 ...
mask (y, x) int8 ...
firn (y, x) float32 ...
surface (y, x) float32 ...
thickness (y, x) float32 ...
bed (y, x) float32 9.606 9.756 9.616 9.321 ... -64.81 -65.57 -66.48
errbed (y, x) float32 ...
source (y, x) int8 ...
dataid (y, x) int8 ...
geoid (y, x) int16 ...
Attributes: (12/17)
Conventions: CF-1.7
Title: BedMachine Antarctica
Author: Mathieu Morlighem
version: 03-Jun-2022 (v3.4)
nx: 13333.0
ny: 13333.0
... ...
ymax: 3333000
spacing: 500
no_data: -9999.0
license: No restrictions on access or use
Data_citation: Morlighem M. et al., (2019), Deep glacial tr...
Notes: Data processed at the Department of Earth Sy...We choose to now extract the variables of interest and convert them to a PyTorch tensor. PyTorch is a popular deep learning package and it has many useful e.g. Computer Vision algorithms implemented. Images (i.e. scenes) in torch are usually retresented as [N, C, H, W] tensors, where N is the number of images (here N = 1), C is the number of channels, H is the dimensionality in Height direction, and W is the dimensionality in Width direction.
We will only extract the bed elevation values, the surface elevation values and the y and x coordinates. We concatinate them into a [C, H, W] tensor with C = 4. Using .unsqueeze() to create explicit dimensions (extra dimensionality 1), we can convert it into a [N, C, H, W] tensor of shape [1, 4, 100, 100], where
[:, 0, :, :] is the bed elevation channel.[:, 1, :, :] is the surface elevation channel.[:, 2, :, :] is the y coordinate channel.[:, 3, :, :] is the x coordinate channel.# Extract X or y dimensionality
scene_dimensions = len(xr_bedmachine_scene.coords["x"].values)
scene_tensor = torch.cat((torch.tensor(xr_bedmachine_scene.bed.values).unsqueeze(0),
# Convert to tensor and create explicit first dimension to enable concatination.
torch.tensor(xr_bedmachine_scene.surface.values).unsqueeze(0),
# YX order
# Copy 1D y column vectors for all columns (all x values) to create a 2D field.
torch.tensor(xr_bedmachine_scene.coords["y"].values).unsqueeze(-1).repeat(1, scene_dimensions).unsqueeze(0),
# Copy 1D x row vectors for all rows (all y values)
torch.tensor(xr_bedmachine_scene.coords["x"].values).repeat(scene_dimensions, 1).unsqueeze(0)),
# Concatinate along first (index 0) dim
dim = 0).unsqueeze(0)
# Tensors have no names for dims. Common tensor shape is N, C, H, W
scene_tensor.shape
# Can be saved as .pt file
torch.Size([1, 4, 100, 100])
We use imshow() (image show) from plotly express.
# Use matplotlib colorscaled in Plotly
# Good colorscales from matplotlib
# plt.cm.winter
# plt.cm.summer
# plt.cm.cubehelix
# plt.cm.ocean
# Extract 5 colors from colormap
winter_cmap = matplotlib.cm.get_cmap('winter', 5)
for i in range(winter_cmap.N):
rgba = winter_cmap(i)
print(matplotlib.colors.rgb2hex(rgba))
winter_color_continuous_scale = [[0, '#0000ff'], [0.25, '#0040df'], [0.5, '#0080bf'], [0.75, '#00bf9f'], [1, '#00ff80']]
#0000ff #0040df #0080bf #00bf9f #00ff80
# Visualising only the surface elevation channel: Select first & only (index 0) image and second (index 1) channel.
px.imshow(scene_tensor[0, 1],
color_continuous_scale = "ice",
# color_continuous_scale = winter_color_continuous_scale,
labels = dict(x = "x index", y = "y index"),
title = "Surface elevation of one East Antarctic scene, using z values only")
Observation:
Let's visualise the data with the coordinate values. We must specifc origin = "lower", because origin = "upper" is the default, and this would revert the order of the y-axis.
px.imshow(img = scene_tensor[0, 1],
x = scene_tensor[0, 3][0, :],
y = scene_tensor[0, 2][:, 0],
origin = "lower",
color_continuous_scale = "ice",
labels = dict(x = "x coordinate", y = "y coordinate"),
title = "Surface elevation of one East Antarctic scene, using z, x, and y values")
We can also visualise for instance the surface topography in 3D using Plotly go.
Be careful when visualising the data by specifying the z values only. Because the defaults are different, we need to flip the y-axis of the tensor, to match the default representation. We also change the orientation of the x and y axis to have the x-y origin facing us.
# !! Not advised to visualise data without x and y axis
# Flips ordering of y axis from high-to-low to low-to-high
fig = go.Figure(data = [go.Surface(z = torch.flip(scene_tensor[0, 1], dims = (0,)),
colorscale = "ice")])
fig.update_layout(height = 600, width = 800, template = "plotly_white")
# Optional: Change view angle
# fig.update_scenes(camera_eye_x = 1.25)
# fig.update_scenes(camera_eye_y = 1.25)
# fig.update_scenes(camera_eye_z = 1.5)
fig.update_scenes(xaxis_title_text = 'x index',
yaxis_title_text = 'y index',
zaxis_title_text = 'surface elevation')
# Change view
fig.update_scenes(xaxis_autorange = "reversed")
fig.update_scenes(yaxis_autorange = "reversed")
fig.show()
# Flipping not needed if we add the true x and y values as input.
fig = go.Figure(data = [go.Surface(z = scene_tensor[0, 1], x = scene_tensor[0, 3], y = scene_tensor[0, 2],
colorscale = "ice")])
fig.update_layout(height = 600, width = 800, template = "plotly_white")
fig.update_scenes(xaxis_title_text = 'x coordinate',
yaxis_title_text = 'y coordinate',
zaxis_title_text = 'surface elevation')
# Reverse both axis to have x-y origin facing front corner
fig.update_scenes(xaxis_autorange = "reversed")
fig.update_scenes(yaxis_autorange = "reversed")
fig.show()
However, this visualisation heavily distorts the change in elevation. We actually only have a differance of ~ 90 m stetching across 50 x 50 km. The selected region is rather a plateau than a mountainous area. Let's find a more accurate representation, specifying the aspect ratio from the data, since x, y and z all have unit "meters".
# Flipping not needed if we add the true x and y values as input.
fig = go.Figure(data = [go.Surface(z = scene_tensor[0, 1], x = scene_tensor[0, 3], y = scene_tensor[0, 2],
colorscale = "ice")])
fig.update_layout(template = "plotly_white")
# Reverse both axis to have x-y origin facing front corner
fig.update_scenes(xaxis_autorange = "reversed")
fig.update_scenes(yaxis_autorange = "reversed")
fig.update_scenes(xaxis_title_text = 'x coordinate',
yaxis_title_text = 'y coordinate',
zaxis_title_text = 'z')
# Specify aspect ratio from data
fig.update_scenes(aspectmode = "data")
# Zoom out to displace full scene
fig.update_scenes(camera_eye_x = 8.0)
fig.update_scenes(camera_eye_y = 8.0)
fig.update_scenes(camera_eye_z = 8.0)
fig.show()
Polar data typically comes in a projection format that is optimised for the poles. Equal-Area Scalable Earth (EASE) Grids (e.g. epsg:6933), Polar Stereographic Projections epsg:3031 (for north and south) and Rotated Pole are the most common. BedMachine comes in WGS 84 / Antarctic Polar Stereographic. See epsg:3031 for more details.
For interoperability, we might need to integrate data with other source projections, or we might need to re-project our data input standard geo-referencing systems.
A great tool for sporadic conversions - without opening Python every time - is the Polar Geospatial Centre (PGC) coordinate converter.
Example:
Coordinates in the top right corner of the article.Otherwise, we can use pyproj for projection and mapping tasks. Recall that a projection is a mapping between points (pairs of x-y coordinates) defined in a source coordinate reference system and points in a target reference system. We map from the Polar Stereographic projection to a regular Lon Lat projection in the following:
# Projection details are saved as attributes in the .nc file
print("Projection:", xr_bedmachine.attrs["Projection"])
print("Proj4 projection code:", xr_bedmachine.attrs["proj4"])
Projection: Polar Stereographic South (71S,0E) Proj4 projection code: +init=epsg:3031
# This is how projections work. check through tool
polarstereo_to_lonlat = pyproj.Transformer.from_crs(crs_from = pyproj.CRS("epsg:3031"),
crs_to = pyproj.CRS("epsg:4326"),
always_xy = True) # xy order convention
# Pass in x and y meshgrids to return lon and lat meshgrids
lon_array, lat_array = polarstereo_to_lonlat.transform(scene_tensor[0, 3], scene_tensor[0, 2])
# Print for first point
print("x y (polar stereo):", scene_tensor[0, 3, 0, 0].item(), scene_tensor[0, 2, 0, 0].item())
print("lon lat (DD):", np.round(lon_array[0, 0], 4), np.round(lat_array[0, 0], 4))
x y (polar stereo): 1034000.0 -552000.0 lon lat (DD): 118.0955 -79.2427
base_color = "#0000ff"
scene_color = "#00bf9f"
hfont = {'fontname':'Helvetica'}
# Initialise plot
fig = plt.figure(figsize = [6, 6])
ax = plt.axes(projection = ccrs.SouthPolarStereo())
# hides circular boundry line
ax.axis('off')
# restrict to lat < -65
ax.set_extent([- 180, 180, - 90, - 65], ccrs.PlateCarree())
# add grey land
ax.add_feature(cfeature.LAND, facecolor = ("#FAFAFA"), alpha = 1.0)
# thin coastline lines
ax.add_feature(cfeature.COASTLINE, edgecolor = base_color, linestyle = '-', linewidth = 0.8, alpha = 0.7)
# Add rectable for scene
ax.add_patch(mpatches.Rectangle(xy = [scene_x_min, scene_y_min],
width = scene_size,
height = scene_size,
facecolor = 'none', edgecolor = scene_color, linewidth = 0.8,
transform = ccrs.SouthPolarStereo()))
# Add Scene label
plt.annotate("Scene", (scene_x_min - 100000, scene_y_min + 150000), **hfont, color = scene_color,
transform = ccrs.SouthPolarStereo())
# Add Dome C marker
ax.scatter(1359993, - 894443,
color = base_color, marker = "x",
transform = ccrs.SouthPolarStereo())
# Add Dome C label
plt.annotate("Dome C", (1359993 - 100000, - 894443 + 150000), **hfont, color = base_color,
transform = ccrs.SouthPolarStereo())
plt.show()
# Python package versions used
%load_ext watermark
%watermark --python
%watermark --iversions
Python implementation: CPython Python version : 3.9.16 IPython version : 8.10.0 torch : 1.13.1 matplotlib: 3.6.3 pyproj : 3.4.1 plotly : 5.13.1 watermark : 2.3.1 numpy : 1.23.5 xarray : 2022.11.0 cartopy : 0.21.1